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Abstract 

Given a finite typed rooted tree T with n vertices, the empirical subtree measure is the uniform 
measure on the n typed subtrees of T formed by taking all descendants of a single vertex. We 
prove a large deviation principle in n, with explicit rate function, for the empirical subtree 
measures of multitype Galton- Watson trees conditioned to have exactly n vertices. In the 
process, we extend the notions of shift-invariance and specific relative entropy — as typically 
understood for Markov fields on deterministic graphs such as Z d — to Markov fields on random 
trees. We also develop single-generation empirical measure large deviation principles for a 
more general class of random trees including trees sampled uniformly from the set of all trees 
with n vertices. 
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Watson tree, multitype Galton- Watson process, multitype Galton- Watson 
tree, marked tree, large deviation principle, empirical pair measure, empir- 
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1. Introduction 

The empirical measures of Markov fields on large, deterministic subsets A of Z d — and the limit points of 
these empirical measures — play a central role in statistical physics and the theory of Gibbs measures. 
The limit points are always shift-invariant, and the rate functions of the empirical measure large 
deviation principles are generally denned in terms of specific relative entropy or specific free energy, 
see, e.g., Chapters 14-16 of Gc88 . 

When Z rf is replaced with a random graph, the large deviation analysis of even the simplest models — 
say, Ising or Potts models — becomes more difficult. How does one even define "shift-invariance," for 
example, when the graphs on which the models are defined are random and almost surely possess no 
translational symmetries? What is the most natural analog of "specific relative entropy"? For that 
matter, what is the most useful definition of "empirical measure"? 

The purpose of this paper is to answer the above questions for some natural random planar rooted 
tree models. By planar we mean that the offspring of each vertex are implicitly ordered — from left to 
right; this ordering determines an embedding of the tree in the plane. 

Given a finite planar rooted tree T with n vertices with types drawn from a finite type set X, the 
empirical subtree measure v T is the uniform measure on the n typed subtrees of T that are formed 
by taking all descendants of a single vertex of T. We will prove a large deviation principle, with an 
explicit rate function defined in terms of specific relative entropy on the empirical subtree measures 
of multitype Galton- Watson trees conditioned to have exactly n vertices. 
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The rate function of this large deviation principle will be infinite on measures that lack a natural 
"shift-invariance" property. A shift-invariant measure v on trees may be either almost surely finite or 
almost surely infinite. In either case, we will show that every shift-invariant measure can be "extended 
backwards" to describe the "infinite past" of a sample from the tree. We may also view this backward 
tree construction as a general technique for examining the steady state of a randomly expanding 
system. It is on these backward tree measures that we will actually define specific relative entropy, as 
the conditional entropy of the offspring measure at the root given its infinite past. 

One motivation for pursuing this problem is the study of tree-indexed Markov chains, defined as follows. 
First we sample a tree from some probability measure, and then, given this tree, we run a Markov 
chain on the vertices of the tree in such a way that the state of a vertex depends only on the state of its 
parent. The result of this two-step experiment can also be interpreted as a typed tree. We always look 
at probabilities with respect to the whole experiment, or, in the language of random environments, 
at the annealed probabilities. These tree-indexed process are a natural concept of increasing interest 
in probability and applications (see, e.g., |BP94j . |Pe95j and |LPP95j ). often as a new way of looking 
at existing models. Our analysis will show that large deviations results, which are well-known for 
classical Markov chains, can be extended to Markov chains indexed by random trees. 

When we restrict our attention to a single generation of the empirical measure (the "empirical offspring 
measure") or to a type of empirical measure on typed edges (the "empirical pair measure") we will 
obtain a generalized large deviation principle for which the classical Markov results (as developed in, 
e.g., |DZ98j and the references therein) are a special case. In fact, these turn out to be among the rare 
problems for which large deviation rates can be stated completely explicitly in a closed form. Indeed, 
the rates we find in this setting are hardly more complicated than the rates for classical Markov chains. 
For example, our rate functions are simple enough to allow one to compute the pressure and related 
macroscopic quantities for Gibbs measures corresponding to a short-range potential with configuration 
space that is the set of all typed rooted trees of n vertices with types in X. This is in sharp contrast 
with the large deviation principle for the distance from the root of simple random walk on supercritical 
Galton- Watson trees, for which no explicit rate function is known, see |DGPZ02] . 

In another application, from the case of binary trees and uniform distribution of types, we calculate 
an explicit growth rate for the total number of binary trees of size n (odd) with types in a finite 
alphabet X, which have an empirical pair measure in a given set of measures. In KM02 the analogous 
combinatorial formula for the number of tuples of length n with a given empirical pair measure was 
used to analyse the tail behaviour of Brownian intersection local times. We hope that the formulas 
derived here give rise to a similar analysis of the tail behaviour of integrated super-Brownian excursion, 
as formulas for high moments of intersection local times involve summation over large binary trees, 
see e.g. [E2S|. 

There are a number of technical issues that make the analysis of tree-indexed Markov chains more 
complicated than the analogous work for classical Markov chains. One arises from the fact that, for 
some models of Galton- Watson trees, the probability of having exactly n vertices is zero for n in 
an infinite subset of Z. It is therefore necessary to restrict our attention to those n for which the 
probability is positive and to prove lower bounds on probabilities that apply only for select values 
of n. Another arises from the possibility of an unbounded number of offspring at a single step, which 
necessitates the use of a technical "mass exchange" argument in Lemma 13.61 

The precise statements of our results are given in Section |2] beginning with empirical pair and empirical 
offspring measures and then progressing to the empirical subtree measures. The former results will 
apply to a larger class of random trees than the latter, which will only be proved for bounded-offspring 
multitype Galton- Watson trees. The proofs of all of these results are then given in Section El 
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2. Statement of the results 

By T we denote the set of all finite rooted planar trees T, by V = V(T) the set of all vertices and by 
E = E(T) the set of all edges oriented away from the root, which is always denoted by p. We write 
\T\ for the number of vertices in the tree T, with the £;-th generation of T being the subset of vertices 
of T of distance k from its root and the height of T is the largest k such that the £;-th generation of 
T is non-empty. 

Suppose that T is any finite tree and we are given an initial probability measure p and a Markovian 
transition kernel Q : X x X — > [0, 1] on a finite alphabet X. We can obtain a tree indexed Markov 
chain X : V — ► X by choosing X(p) according to p and choosing X(v), for each vertex using 
the transition kernel given the value of its parent, independently of everything else. If the tree is 
chosen randomly, we always consider X = {X(v) : v € T} under the joint law of tree and chain. It 
is sometimes convenient to interpret X as a typed tree, considering X(v) as the type of the vertex v. 

We first look at the class of Galton- Watson trees, where the number of children N(v) of each v G T 
is an independent random variable, with the same law p{ ■ ) = ¥{N(v ) = • } for all v € T, such that 
< p(0) < 1. With each finite tree and sample path X we associate a probability measure on X x X, 
the empirical pair measure Lx, by 



L x (a,b) = ^<*(x(ei),X(e 2 ))(a>&) 5 for a,b £ X, 



\E, 

1 1 ee£ 



where ei,e2 are the beginning and end vertex of the edge e & E (so ei is closer to p than e<i). Our 
first result is a large deviation principle for Lx, conditional upon the event {|T| = n} with n chosen 
such that the latter has positive probability. For its formulation recall the definition of the relative 
entropy H{- || •) from |DZ981 (2.1.5)] and Cramer's rate function 



oo 

I p (x) = sup \\x - log \Tp(n)e Xn ] } , (2.1) 



n=0 

as in |DZ98[ (2.1.26)]. 

Theorem 2.1. Suppose that T is a Galton- Watson tree, with offspring law p(-) such that < p(0) < 
1 — p{\), ^2in^p{i) = 1 and l~ l logp(^) — > — oo. Let X be a Markov chain indexed by T with arbitrary 
initial distribution and an irreducible Markovian transition kernel Q. Then, for n — > oo, the empir- 
ical pair measure Lx, conditioned on {\T\ = n} satisfies a large deviation principle in the space of 
probability vectors on X x X with speed n and the convex, good rate function 

pi{a)- 



H{p\\p 1 ®Q) + ^p 2 {a)I p (^f(] ifpi<.pi, 



m = < at£ "WW (2-2) 

oo otherwise, 

where p\ and pi are the first and second marginal of p and p\ <g) Q(a, b) = Q{b \ a}pi(a). 
Remarks: 

• Throughout the paper we implicitly assume that the conditioning events {\T\ = n} are of positive 
probability, that is, our large deviation approximation of probabilities hold for those values of n where 
P{|T| = n} > 0. For the general structure of the set 5 of admissible values, see the proof of Lemma 13. II 

• In case ^2,i^p{f) ^ 1 note that the distribution of T conditioned on {\T\ = n} is exactly the same 
as when the offspring law is p$(£) = p(£)e /X/jP(i) e \ regardless of the value of 9 € R. With 
< p(0) < 1 — p(l) there exists a unique 6* such that Yle^P9* CO = 1- Hence, Theorem 12.11 still 
applies, using I pe in place of I p in (|2.2j) . 
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• The representation (|2.2|) of /(•) provides the interpretation of the large deviations of Lx as the 
result of two independent contributions: when \i\ = \i2 we have only the term H(p \\ p\ <g> Q) which 
is the rate function for the large deviation principle of empirical pair measures of the Markov chain 
with kernel Q, see e.g. |DZ98| Section 3.1.3], while the hard constraint of [i\ = \i2 of the Markov chain 
setting is replaced here by the additional term ^ a ij, 2 (a)Ip(/j,i(a) / /12(a)) reflecting the large deviations 
contribution due to the geometry of the tree T. 

Examples: 

The class of Galton- Watson trees conditioned on the total size appears in the combinatorial literature, 
see e.g. |MM78j . under the name simply generated trees and is surveyed in |A191j . We look at some 
interesting examples. 

• Choose the offspring law p(-) such that p(k) = 1 — p(0) = 1/k. In this case P{|T| = n} > if 
and only if n — 1 is divisible by k. The law of T conditional on {|T| = n} is exactly the same as 
sampling the tree uniformly from the collection of all possible fc-ary trees with n vertices. We have 
that I p (x) = (x/k)logx + (1 — x//c)log((l — x/k)/(l — 1/k)), leading to the good rate function 

I( ) = { H ^ I' + k T kH (k L i( k ^ ~ Mi) II M2) + \H(m II M2) if kn2 > Ml, 

1 00 otherwise, 

for the large deviation principle of Lx- 

• Choose the offspring law p( ■ ) as the standard Poisson distribution, p(£) = e~ lL ji\ for I = 0, 1, 2, 

Now P{|T| = re} > for all re > 1 and the law of T conditioned on {|T| = n} is that of a tree chosen 
uniformly from all unordered trees with n vertices. We have I p (x) = 1 — x + xlogx, and get a large 
deviations rate of I(fi) = H(/j, \\ fii (g) Q) + H(n\ Wfiz) in ()2.2|) . 

• Choose the offspring law p( • ) as p(0) = p(l) = ■ ■ ■ = p(k) = l/(k + 1). Note that this law is 
only critical if k = 2, and recall the second remark following Theorem 12.11 Again P{|T| = n} > 
for all n > 1, and now the law of T conditional on {\T\ = n} is the same as sampling the tree 
uniformly from the collection of all ordered trees with n vertices and offspring number bounded by k. o 



The result extends to other classes of trees, indeed one can go much beyond the present setting and 
consider trees and types chosen simultaneously according to a multitype Galton- Watson tree. In this 
situation, in order to obtain more explicit rate functions, it is useful to replace the empirical pair 
measure by a more inclusive object, the empirical offspring measure. 

We write X* = U^Loi^J" x ^ n ano - ec l in P it with the discrete topology. Note that the offspring of any 
vertex v € T is characterized by an element of X* and that there is an element (0, 0) in X * symbolizing 
lack of offspring. For each typed tree X and each vertex v we denote by 

C(v) = (N(v),X 1 (v),...,X m (v)) G X* 

the number and types of the children of v, ordered from left to right. To each sample chain X we 
associate a probability measure Mx on X x X* called the empirical offspring measure, which is defined 
by ' i 

M x (a, c) = — ^ \x{v),c(v)) (a, c). 

We now describe the joint law of a tree T and tree-indexed chain X, which defines a multitype Galton- 
Watson tree. The ingredients are a probability measure /i on X, serving as the initial distribution, 
and an offspring transition kernel Q from X to X* . We define the law P of a tree-indexed process X 
by the following rules: 

• The root p carries a random type X(p) chosen according to the probability measure p on X . 
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• For each vertex with type a G X the offspring number and types are given independently of 
everything else, by the offspring law Q{ ■ | a} on X* . We write 

Q{- \a} = Q{(N,X 1 ,...,X N ) G • |a}, 

i.e. we have a random number N of offspring particles with types Xi, . . . , X^. 

We assume that the exponential moments Q{e vN \ a} < oo, for all a G X and n > 0. We also need 
a weak form of irreducibility assumption. Denote, for every c = (n, a%, . . . , a n ) G X* and a G X, the 
multiplicity of the symbol a in c by 

™(a,c) =^l {a . =a} . 

i=l 

Define the matrix A with index set X x X and nonnegative entries by 

A(a, b) = y~] Q{c | 6}m(a, c), for a,b E. X, 

cex* 

i.e. ^4(a, 6) are the expected number of offspring of type a of a vertex of type b. With A* (a, b) = 
Yl'kLi A k (a, b) G [0, oo] we say that the matrix A is weakly irreducible if X can be partitioned into a 
non empty set X r of recurrent states and a disjoint set X t of transient states such that 

• ^4* (a, 6) > whenever b G AV, while 

• A* (a, b) = whenever b E Xt and either a = 6 or a G X r . 

For example, any irreducible matrix A has A* strictly positive, hence is also weakly irreducible with 
X r = X. The multitype Galton- Watson tree is called weakly irreducible (or irreducible) if the matrix A 
is weakly irreducible (or irreducible, respectively) and the number ^2 aGXt w(a, c) of transient offspring 
is uniformly bounded under Q. 

Note that a weakly irreducible matrix has A (a, b) = whenever b G Xt an d a G X r . Moreover 
Xt may be ordered such that A{a,b) = when a > b are both in Xt. Consequently, the non-zero 
eigenvalues of a weakly irreducible matrix A are exactly those of the irreducible matrix obtained by 
its restriction to X r . Recall that, by the Perron-Frobenius theorem, see e.g. |DZ98, Theorem 3.1.1], 
the largest eigenvalue of an irreducible matrix is real and positive. Obviously, the same applies to 
weakly irreducible matrices. The multitype Galton- Watson tree is called critical if this eigenvalue is 
1 for the matrix A. 

Our second main result is a large deviation principle for Mx if X is a multitype Galton- Watson tree. 
For its formulation denote, for every probability measure v on X x X* , by v\ the X -marginal of v. 
We call v shift-invariant if 

v\(a) = m(a,c)v(b,c) for all a G X '. 

(b,c)£XxX* 

We denote by A4(X x X*) the space of probability measures v on X x X* with J nv{da ,dc) < oo, 
using the convention c = (n, a l5 . . . , a n ). We endow this space with the smallest topology which 
makes the functionals v \— * J f(b, c) v{db , dc) continuous, for / : X x X* — > M either bounded, or 
f(b,c) = m(a, c)lb (b) for some a, bo G X. Define the function J on M.(X x X*) by 

, , J H(y || v\ ®Q) if v is shift-invariant, 
^ ' 1 oo otherwise. 

In general, the topology on A4(X x X*) is stronger than the weak topology, making the function J 
lower semicontinuous, as shown in Lemma 13.41 
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Theorem 2.2. Suppose that X is a weakly irreducible, critical multitype Galton-Watson tree with an 
offspring law whose exponential moments are all finite, conditioned to have exactly n vertices. Then, 
for n — > oo, the empirical offspring measure Mx satisfies a large deviation principle in A4(X x X*) 
with speed n and the convex, good rate function J. 

Examples: 

• The situation of Theorem 12.11 corresponds to offspring kernels Q{ ■ \a} choosing offspring numbers 
according to the law p( ■ ) and then choosing the offspring types independently, according to the 
marginal law Q{ ■ \ a} on X. Consequently, Theorem 12 . II follows by contraction from Theorem 12. 2) see 
Section [3.41 for more details. As its proof reveals, Theorem 12. II applies even when the law of offspring 
numbers p(-\a) depends on the type of the parent, provided the matrix Q{b \ a} tp(l \ a) is weakly 
irreducible, with largest eigenvalue one (then, of course, I p (-\ a ) replaces I p in 1)2. 2jl ). 

• For a more concrete example contained in our framework, we suppose that individuals in a population 
may have two genetic types, A and B. Individual of type A (resp. B) breed offspring according to 
the law pa (resp. Pb), typically of the same type, but independently, mutations occur with a small 
probability p > 0. Denote by rj the ratio of the mean offspring number of pa and ps, representing 
the genetic advantage of type A. In a large family of size n the probability that the ratio of the 
numbers of individuals of type A and B in the population is close to x € [0, 1] is approximately equal 
to exp(— nl(x)) for 

I(x) = inf \——H{ua || Qa) H 7T H {vb \\ Qb)}, 

lx + 1 X + 1 J 

where (^(n, m) = PA( n + m ) p m (l —p) n and qs(n, m) = psin + m) ( n ^ n ) p n (l —p) m and the 

infimum is over all probability measures va, v b on N x N satisfying 

oo oo 

x = nxvA(n,m) + nvB(n,m) and I = mxuA(n,m) + mz/^(n, m). 

n,m=0 n,m=0 

This rate function is zero exactly at the typical ratio, which is given by the solution x > of the 
equation x/(l + x) = (xr](l — p) + p)/(xrj + 1). Our result gives the probability of a significant 
deviation from this ratio, the precise rate is depending of course on the exact offspring laws of 
particles of either genetic type, represented by pa,Pb- o 



We conclude with the extension to a process level large deviation principle. For the rest of this section 
we assume that the offspring numbers generated by the kernel Q are uniformly bounded by some 
iVo € N. We denote by X the set of all finite or infinite rooted, planar trees such that every vertex 
has at most iVo offspring, with types from the finite alphabet X attached to the vertices. Recall that 
the fact that the trees are embedded in the plane imposes an ordering (say from left to right) on the 
children of each vertex. 

The laws of multitype Galton-Watson trees are probability measures on X. We equip X with the 
topology generated by the functions / : X — > ]R depending only on a finite number of generations. 

If v € V is a vertex of a tree T and X £ X a sample chain on this tree, we denote by X v the sample 
chain obtained from the subtree of T consisting of v and all successors of v. To each finite sample 
chain X we associate a probability measure Tx on X, the empirical subtree measure, which is defined 
by ' 

T x (x) = — 6 X v(x), for x G X. 

To formulate a large deviation principle for the random variable Tx we need further notation. We 
denote by N[k] the number of vertices in generation k, and in particular by N = N[l] the number of 
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children of the root in T. Suppose that fi is a probability measure on X with J Ndfj, = 1. Then we 
can define a shifted probability measure S(n) on X by 

JV 

S(ju)(r) = / ^ l { x^er}, for any Borel set T C X , (2.3) 

i=l 

where v±, . . . , vn are the children of the root. We call /i shift-invariant if S(n) = [i. 

To any shift-invariant measure /i on X we can associate a backward tree measure /z* in the following 
way. Suppose that X is a sample chain on a (finite or infinite) tree of height at least k, and mark a 
vertex in generation k of X as the centre of the tree. Denote by X[k] the set of all objects (x, £) (typed 
tree x with centre at £) arising in this way, endowed with the canonical topology inherited from X. 
For k > I there are canonical projections pki ■ X[k] — > X[l] obtained by keeping the same centre and 
removing all vertices from the tree whose last common ancestor with the centre lived before generation 
k — I. Note that the root of the projected tree PkiX is the ancestor of the centre in generation k — I. 
The spaces X[l] and projections pki,k > / form a projective system. Hence there exists a projective 
limit space X_, the space of backward trees, and canonical projections pk : X_ — ► X[k}. See [HZ98, 
Appendix B] for more information about projective limits. 

If fj, is a shift-invariant measure then we can associate a measure /x^ on X[k] by 

, N[k] 

Mfe (r) = / dfi(x) 1 {{x,vi)er}, for any Borel set T C X[k], 

i=l 

where v±, . . . , WMJfc] are the vertices in generation k of X. 

Shift-invar iance of \x ensures that all fik are probability measures and that \x\ = /ifc op^ 1 for all k > I. 
Hence, by Kolmogorov's extension theorem, there exists a unique probability measure /i* on X_ such 
that n* op]. 1 = fJ-k- This is the backward tree measure \x* associated to fi. 

For each k > 1 we denote by pi ^ : X [k] — > X[k] the projection obtained by removing all vertices of 
distance at least k + 2 from the root and all those of distance k + 1 from the root whose parent is to 
the right of the centre. Similarly, we denote by po,fc : X [k] — > X[k] the projection which in addition to 
all the vertices removed by pi^ also removes all children of the centre. Note that pki o ^ = po,« °Pki 
and pki o pi ;k = pi i o pki for all k > I. Hence, the projective limits pi : X_ — > X_ and po : X_ — > X_ of 
pi ; fc and po,fc, respectively, are well defined with Pk °po = po,fc °Pk and Pk °Pi = pi,k Pk for all k > 1 
(heuristically, pi is the projection obtained by removing all vertices of the backward tree further from 
the root than the centre except the children of the centre and those of the vertices to the right of the 
centre whose distance from the root is the same as the centre, with po removing also the children of 
the centre). If Q is an offspring transition kernel, we define //* o pQ 1 ® Q as the probability measure 
generated by starting with a backward tree sampled according to fi* o pQ 1 and adding independently 
offspring according to Q to the centre. Let M{X) be the set of probability measures on X. Define 
the function K on M{X) by 

„ n _ f H(y[i* o p^ 1 || fj,* o p^ 1 <8) Q) if fj, is shift-invariant, 
[ oo otherwise. 

We equip A4(X) with the smallest topology which makes the functionals /jh J f dfi continuous, for 
each continuous and bounded / : X — > R. 

Theorem 2.3. Suppose that X is an irreducible, critical multitype Galton-Watson tree with uniformly 
bounded offspring sizes, conditioned to have exactly n vertices. Then, for n — > oo, the empirical 
subtree measure Tx satisfies a large deviation principle in M{X) with speed n and the convex, good 
rate function K . 



We now give a brief overview over the following sections, which contain the proofs of our results. First 
we need to establish the fact that for a critical multitype Galton- Watson tree our conditioning events 
{\T\ = n} decay with an exponential rate zero over the set of admissible values of n. The proof of 
this fact, well-known for single-type Galton- Watson trees, requires a careful analysis of the lattice 
structure of the set S = {n G N : F{\T\ = n} > 0} in the multitype case, and is of some independent 
interest. This result is proved in Section l3~Tl 

Equipped with this result, in Section 13.21 the upper bound of Theorem 12.21 is derived. Exponential 
tightness is established in the topology on A4(X x X*) using the moment conditions imposed on Q. 
Based on the exponential Chebyshev inequality we first represent the upper bound in a variational 
form, and then solve the variational problem. Nonstandard arguments arise in the proof from the fact 
that we endow A4(X x X*) with a topology, which is stronger than the weak topology of measures. 
This is necessary in order to make the set of shift-invariant measures a closed set in our topology. 

The lower bound, proved in Section 13.31 is based on a change of measure technique. As we allow for 
potentially unbounded offspring numbers intricate approximation arguments are needed to show that 
this change of measure provides sufficient freedom to represent a sufficiently large class of offspring 
measures. In Section f3. 41 we prove Theorem I2.1l bv contraction from Theorem 12.21 

Finally, in Section 13.51 we prove Theorem 12.31 For this purpose we first extend Theorem 12.21 from 
one-generation offspring measures to fc-generation offspring measures, see Lemma 13.81 This extension 
is based on expanding the statespace and needs crucially the fact that in Theorem 12.21 we are only 
requiring weak irreducibility. The step from /c-generation offspring measures to empirical subtree 
measures is then based on the Dawson-Gartner Theorem. 

3. Proof of the large deviation principles 

3.1 On the rate of decay of F{\T\ = n}. 

An important role in our proofs is played by the fact that for critical multitype Galton- Watson trees the 
probability P{|T| = n} decays only subexponentially on the set S of integers n where the probability 
is positive. We exclude the trivial case when S fails to be infinite from our consideration (in particular, 
we assume throughout that fi(X r ) > 0). 

Lemma 3.1. Suppose T is the random tree generated by a weakly irreducible, critical multitype Galton- 
Watson tree with finite second moment. Then 

lim -logP{|T| = nj = 0. 

nSS " 

Proof. Recall that the number of children of any given v G T with types in X t is uniformly bounded. 
Moreover, if X(u) G Xt for some u G T then there are only types from Xt in the sample chain X u 
consisting of u and all successors of u, and the height of the corresponding subtree T u is uniformly 
bounded (by the size of X t ). Let G(v) = ^ \T Ui | over the children u±,U2, ■ ■ ■ of v such that X{ui) G X t . 
Hence G(v) is also uniformly bounded, say by N\ < oo. For c G X* let c\X r be the natural restriction 
of c to X*. For each b G X r , c G X* and g G {0, ...,Nx} let Q r {(c, g) \ b} denote the probability 
induced by Q that given X(v) = b we have C{v)\X* = c and G(v) = g. Then, for each c r G X* , 

JVi 

J2<®r{(Cr,g)\b}= 

3=0 {ceX* :c\X r =c r } 

so Q r is a transition probability measure from X r to X* x {0, ...,N±} such that A r (a,b) = 
S c g m (°i c)Q r {(c, g) | 6} is exactly the restriction of the matrix A to X r . In particular, since A is 
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weakly irreducible and critical, it follows that A r is irreducible and critical on X r . Further, Q r con- 
structs the restriction of the multitype Galton- Watson tree X to X r with G(v) keeping track of the 
number of vertices with types in Xt that have been omitted as a result of being in T u for some child 
u of v such that X(u) G X t . Thus, fix a type a € X r and construct a multitype Galton- Watson tree 
with law P, for p = 5 a as follows: Start at size n = with one active vertex p of type a. At each 
future step choose an active vertex v uniformly from all active vertices, independently of everything 
else, provide it with offspring C(v) according to Q r {- \X(v)}, adding G(v) + 1 to the current tree 
size n, deactivating v and activating its offspring. When there are no active vertices left, the process 
terminates, producing the restriction to X r of a typed tree of law P and size n for p = 5 a . 

Let p a ,b( n ) be the probability that when the size is n we have exactly one active vertex, which is of 
type b. For any 01,02,03 G X r and positive integers ni,n 2 we have 

^01,02(^1)^02,03(^2) <Pa 1 ,a 3 (n 1 +n 2 ). (3.1) 

Indeed, p ai ,a 2 ( n i)Pa 2 ,a 3 (n-2) is the probability of having exactly one active vertex when the size is n\ 
and again when the size is m + n 2 , having types a 2 and 03, respectively. 

Since the restricted multitype Galton- Watson tree is irreducible, starting with a G X r active vertices 
of each type appear with positive probability and our procedure allows each active vertex to eventually 
remain the only active vertex with positive probability. Hence for any a±,a 2 G X r , there exists n such 
that p ai ,a 2 ( n ) > 0- Together with (|3.1j) this suffices to make the structure of the sets 

S a ,b = {n G N : p a>b (n) > 0} 

for a, b G X r , analogous to that of the sets {n G N : (P n ) a ,b > 0} for a finite state irreducible Markov 
chain with transition matrix P. Namely, there exists a period d =gcd S a a , independent of a G X r , 
and k a ^ G {0, . . . , d — 1} such that S a j, C k a ^ + dN with \{k a ^ + dN) \ S a; b\ < 00, see for example the 
proof in |Du961 Lemmas 5.5.3, 5.5.4 and 5.5.6]. Analogously to the theory of d-periodic finite state 
irreducible Markov chains, ()3.1|) and subadditivity imply the existence of I < 00 such that, for all 
a, b G X r , 

lim -— \ogp a ,b{K,b + Id) = I. 
l^oo id 

(Indeed, one Ccin take first cl — b G ?C T showing existence of limits Ia,a 

< 00, then show that I aa < hb 

for all a, b G X r , hence for each such a and b the limit I a ^ exists and is equal to I aA by a sandwich 
argument). Now let p a (n) = P{|T| = n\X(p) = a}, S a = {n : p a (n) > 0} and X g = {b : 
Qr{((0> 0)> 9) I b} > 0}, noting that the latter set is nonempty for some g (otherwise no finite trees are 
possible). The event {|T| = n} corresponds to one active vertex from X g at size n — l — g producing g 
omitted vertices of types from X t and no offspring with type in X r . Summing over the possible types 
of this vertex we get 

Ni 

Pa(n) = Y,^Pa,b( n - 1 -9n r {m^),9)\b}, 
9=0 bex g 

implying that S a = {n : n — 1 — g G S a ,b for some b G X g } and for any a G X r , 

lim logp a (n) = /. 

Suppose for contradiction that / > 0. Then, for a G X r and all n G S a with n > no, we have 
Pa(n) < exp(— nI/2). As p a (n) = for all n S a , this implies that 

PjlTl > n I X(p) = a}< eXp( ~ n//2) r for all n > n . 
1 ' 1 — exp(— I /2) 
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But this probability is at least as large as the corresponding probability for the restriction of T to 
vertices whose type is in X r . The latter is an irreducible, critical multitype Galton- Watson tree, so 
by the corollary in |AN72| p. 191] under the hypothesis of finite second moment this probability is 
bounded below by a constant multiple of 1/n, which is a contradiction. Hence, 1 = and the result 
of the lemma follows since by the weak irreducibility of X we have that p a {n) = for all n > uq and 
a£X t . □ 

3.2 Proof of the upper bound in Theorem 12.21 

Given a bounded function g : X x X* — > K we define the function 

U~ g (a) = log Q{c\a}e^ c \ 
cex* 

for a £ X. We use g to define a new multitype Galton- Watson tree as follows: 

• The type of the root p is a E X with probability 

• for each vertex with type a E X the offspring number and types are given independently of 
everything else, by the offspring law Q{ • | a} given by 

Q{c\a} =exp (g(a,c) - Ug(a))Q{c\ a] . 

We denote the transformed law by P and make the simple observation that P is absolutely continuous 
with respect to P, as for each finite X E X, 

;W = T7m^ II ex P kx(v)Mv)) - Us(X(v))] (3.2) 



dP v 1 Je u ^fi(db) 



N(v) 

= fe^u(da) II eX P [s&WMv)) ~ E UrAXM)] , (3-3) 
recalling that C(v) = (N(v), X x (v), . . . , X N (v)). 

We begin by establishing exponential tightness of the family of laws of Mx on the space Ai(X x X*). 
Lemma 3.2. For every A > there exists a compact K C M(X x X*) with 

limsup-logP{M x K \ \T\ = n) < -A. 

n—*oo n> 

Proof. Recall that Q{e' JV | a} < oo for all r\ > 0. Hence, given i G N, we may choose k(l) E N so large 
that 

Q{ exp(/ 2 iVl {iV>fc(/)} ) | a} < 2 for all a E X. 
Using the exponential Chebyshev inequality, 

p{ / NdM x > j, \T\ = n\ < e"' n e{ exp (/ 2 n / NdM x ),\T\=n\ 

J{N>k(l)} > ^ J{N>k{l)} > 

= e~ ln E{ H exp (l 2 l {N{v)>k(l)} N(v)), \T\ = n} 

< e~ ln ( supQ{ exp(/ 2 iVl {JV>fc(0} ) | a}Y < e ~ n ^ 2 \ 
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Now choose M > A + log 2. Define the set 

K = \v G M(X x X*) : / Ndu < \, for all I > m\. 

J{N>k(l)} ' > 

As {iV < fc(Z)} C x Af* is compact, the set K is pre-compact in the weak topology, by Prohorov's 
criterion. Moreover, since m(a, c) < N, it is easy to see by truncation that for every weakly convergent 
sequence u n — *■ v with v n G K, we also have rim n _>oo f m(a,c)i , n (b,dc) = f m(a,c)v(b,dc). Hence, K 
is even pre-compact in the stronger topology we are using on the space A4(X x X*). As 

»{M X g K\\T\=n} < p{lT ^ =n}l ^ e . 1 exp(-n(M - log 2)), 



we can use Lemma 13. II to infer that 



limsup-logP{M x K \ \T\ = re} < -A, 

n—*oo re 



as required for the proof. 



□ 



Next we derive an upper bound in a variational formulation. Denote by C the space of bounded 
functions on X x X* and define for each v E M(X x X*), 



J{y) 



9(b,c) -'V\Ug(aj) v{db,dc)\, 



(3.4) 



where c = (n, ai, ■ ■ ■ , a n ). 

Lemma 3.3. For each closed set F C M(X x X* 

1 



lim sup — logP{M x G F | \T\ = n) < - inf J(v). 



n 



Proof. Fix g G C bounded by some M > 0, then also / e u sW fi(da) < e M . Define h : X x X* 
h(b, c) = g(b, c) — Ya=1 ^s( a i); w here as usual c = (re, a±, . . . , a n ), and observe that, by (j3.3j) . 



by 



> P{|T| = n} I e u ^»(da) = e{ exp C(«)) - £ t^(X»)] l {m =n}} 



,n(h,Mx) 



n \ < 0. 



(3.5) 



= E{e»<Mftr> l {|T|=n} }. 
Together with Lemma 13. II this shows that 

lim sup — logE|< 

In view of ()3.2|) the same bound (|3.5|) applies for h : X x X* — »■ R of the form h(b, c) = 5(6, c) — Ug(b). 

Now fix e > 0, and let J £ {y) = min{ J(^), e -1 } — e. Suppose first that v G F is shift-invariant. Then, 
for any g G C, 

/ J2Ug(aj)v(db,dc) = Yl ^2rn{a,c)u(b,c)U- g (a) = ^ C^ g (0)1^(0) = / U- g (b) ^(db). (3.6) 
Choose g v G C such that h u (b,c) = g u (b,c) — Ug(b) satisfies 

(h v ,u):=J h u (b,c)v(db,dc) = J [g v (b,c) -jT' J Ug u { a j) v{db ,dc) > J £ (v) . 
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Since h v is bounded, the mapping (h v ,-) is continuous in Ad(X x X*). Hence there exists an open 
neighbourhood B v of v such that 

inf (h u , n) > (hv, v) — e > J e {v) — e . 
Using the exponential Chebyshev inequality and the remark following ()3.5|) we obtain that, 
limsup- log P{M X G B v \ \T\ = n) 

< limsup - logE{e" < ^' Mx) I \T\ = n\ - JJv) + e < - inf JJv) + e. (3.7) 

n-xx n veF 

Now suppose that v fails to be shift-invariant. Assume first that there exists a G X such that 

v\(a) < m(a, c) v(b, c). (3-8) 

(6,c) 

Recall that the mappings v i— > c m(a, c) v(b, c) are continuous in our topology. Hence there exist 
5 > and a small open neighbourhood B u C M(X x X*) such that 

i>i(a) < m(a, c) c) — 5, for all z> G B u . (3.9) 

(6,c) 

Let § G C be defined by g(b,c) = — (fe)~ 1 l a (6) and h(b,c) = g(b,c) — Y^j=i^g{ a i)- Note that 
Ug(b) = g(b,c) for all b and vanishes unless b = a. Hence, by (|3.9|) . for every P G we have that 
Jhdi>> e" 1 . Then, using the exponential Chebyshev inequality and (|3.5|l . 

lim sup — logP{M x G B„| |T| =n} 

n— >oo W 

< limsup - log E\e n{h ' Mx) I |T| = n] - e~ l < -e' 1 < - inf JJv). (3.10) 

n->oo U 1 

In case the opposite inequality holds in ()3.8|) the same argument leads to (|3.1Uj) if g is defined as 
g(b,c) = (5ey 1 l a (b). 

Now we use Lemma to choose a compact set if with 

limsup - logP{M x £ K\\T\ = n) < -e~ l . 

n—>oo 71 

The set K n F is compact and hence it may be covered by finitely many of the sets B Ul , . . . , B Um , with 
Ui £ F for i = 1, . . . , m. Hence, 

rn 

¥{M X G F\ \T\ = n) < ^P{M X G B v . \ \T\ = n) + F{M X &K\\T\ = n). 

i=l 

Using (j3.7|) and (|3,10|) we obtain, for small enough e > 0, that 

1 1 
limsup - logPjMx G F I |T| = nj < max limsup - logP{M x G 5„ I \T\ = n} < - inf J e (v) + e. 

n-»oo n i=l n^oo TL v&F 

Taking e [ gives the required statement. □ 

We next show that the convex rate function J may replace the function J of (|3.4|) in the upper bound 
of Lemma 13.31 

Lemma 3.4. The function J(-) is convex and lower semicontinuous on A4(X x X*). Moreover, 
J(v) < J{v) for any v G M(X x X*). 
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Proof. We start by proving the inequality J{u) < J{v). To this end, suppose first that v v\ <g> Q. 
Then, there exists (a', d) € X x X* with v(a' ,c') > and Q{c' \a'} = 0. Consequently, U g = for 
g(b, c) = if l( / jC /)(6, c) and any if. Considering such g in ()3.4|) with if f oo we see that J(i/) = oo in 
this case. 

Suppose now that v fails to be shift-invariant, in which case there exists a € X such that v\{a) ^ 
1C(& c )eXxX* m ( a J c ) u (b, c). Choose g(b,c) = Kl a (b), for which J7g (6) = ifl a (6) and 

/ g(b, c) — Ug(a,j) u(db,dc) = K^v\{a) — j m{a,c)v(db,dc) S j — > oo, 

for | if | | oo, with the sign of if chosen so that the right hand side is positive. 

Finally suppose that u is shift-invariant and v <C v\ <2> Q. By the variational characterisation of the 
relative entropy, see e.g. DZ98, Lemma 6.2.13], the definition of U g , Jensen's inequality, and (|3.6j) . 

H{v || v x ® Q) = sup { y gdi/ -log ^ e 9(a ' c) Q{dc|a}^(da)} 

= sup| y gczV-log y e^W*i(da)} (3.11) 

< sup | y gdv — y t/g(a) z/i(da)| = J(z/). 

If i/' 6 X A"*) are both shift-invariant then v\ = Xv + (1 — A)i/' is also shift-invariant for any 

< A < 1. Moreover, f i— > J* m{a,c)v{b,dc) is continuous for each a, 6 € -^f, implying that the set 
5 = : is shift-invariant } is convex and closed in the topology we use on Ai(X x Af*). Note that 
if g G C, then so is C/ g and the mapping z/ i— ► J gdv — log f e Ug<y( ^vi(da) is continuous and convex. 
Consequently, the identity (|3.11|) implies that v i— > ii(V 1| z/i £3 Q) is lower semicontinuous and convex. 
For any a < oo, the level set {y : J{v) < a} is the intersection of the convex, closed sets S and 
{u : ii(z^ || i/i ® Q) < a}. Consequently, J(-) is a convex rate function. □ 



3.3 Proof of the lower bound in Theorem 12.21 

Recall the definition of the multiplicity m(a, c) of the symbol a in c and of the matrix A g with index 
set X x X associated with the transformed multitype Galton- Watson tree, 

Ag(a, b) = Q{c | b}m(a, c), for a, b 6 X. 

cex* 

By our assumptions the matrix A g which has the same set of non-zero entries as A, is weakly irre- 
ducible. Recall that, by the Perron-Frobenius theorem, see e.g. |DZ981 Theorem 3.1.1], the largest 
eigenvalue g g of the irreducible restriction of A g to X r is real and positive, with strictly positive right 
and left eigenvectors. Since A g is weakly irreducible, the largest eigenvalue of A g is also g g . Further, 
recall that A g (a, b) = whenever b S X t and a £ X r or b < a G Xt, while J2bex r ^<?( a ' b) > for any 
a G Xf. Consequently, there exists a unique right eigenvector u g £ E for the eigenvalue g g of A g 
having strictly positive entries, which add up to one. The next lemma guides the choice of g associated 
with a large deviations lower bound at v 6 M.(X x X*) for which J{y) < oo. 

Lemma 3.5. Suppose v E Ai(X x A'*) wii/i v\ strictly positive. The following statements are equiv- 
alent. 

(i) v is shift-invariant and u <C v\ <S> Q- 
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(ii) There exists a function g : X x X* — > IR u>i£/i f7p = 0, such that Qg = 1 and i/ie corresponding 
Perron- Frobenius eigenvector u g satisfies v(a,c) = Q{c | a}u g (a), for every (a,c) £ Af x 

Moreover, if (ii) holds, then H(v \\ v\ ® Q) = J 5(6, c) z^(d6 , dc). 

Proof. Suppose first that f is shift-invariant and <C z^i ® Q. Define 5 by 

5(0, c) = log f i-^'r 1 t ) when Q{ c I «} > , (3.12) 



^i(a)Q{c|a}. 
and otherwise 5(0, c) = 0. Then, for all a & X, 

Q{c\a}e^ a ^ = 1, 

and hence C/g(a) = 0. We infer that 

Q{c\a} = e^ a ' c) Q{c|a}. (3.13) 

Using this and the definition (J3.12|) of g we see that 

u(a,c) = e^ a ^Q{c\a}vi(a) = Q{c | a}z/ x (a). (3.14) 

To identify by Perron- Frobenius theorem, we only have to find the eigenvalue corresponding to a 
strictly positive (right) eigenvector, which turns out to be V\. Indeed, for all a £ X, 

^2 Ag(a, b)v\(b) = Q{c I 6}m(a, c)i^i(5) = ^(6, c)m(a, c) = fi(a), 

b&X (b,c)eXxX* (b,c)£XxX* 

using the shift-invariance of v in the final step. This shows that Qg = 1 and, by uniqueness of the 
eigenvector, v\ = u g . Hence (ii) follows from (|3.14j) . 

Conversely, fix g for which Qg = 1 and (ii) holds. Summing over c £ X* in (ii) we have that v\ = u g 
and hence 1^ <C i-'i ® Q. Moreover, for all a € X, 

vi{a) = S ^ J Ag(a,b)ui(b) = ^ m(a, c)Q{c | = ^ m(a,c)u(b,c), 

b&X (b,c)eXxX* (b,c)eXxX* 

hence v is shift-invariant. Moreover, using v{a, c) = Q{c | a}vi(a) and the definition of Q, we get 

H(u || ui ® Q) = V] i/(a, c) log I a j = / g(a,c)u(da,dc), 
^— ' Oca / 

which completes the proof. □ 

The next lemma is key to the proof of the lower bound in Theorem 12.21 It allows us to focus on 
those shift-invariant v £ M(X x X*) with strictly positive first marginal, for which g of Lemma 13.51 
is bounded above. If v E M(X x X*) and a £ X we write i^( • | a) = v{ ■ , a)/Vi(a). 

Lemma 3.6. Suppose O is an open subset of A4(X x X*) and v £ O oti7i J(i^) < 00. Then, for any 
5 > 0, there exists v E O with J{y) < J{v)+5, such that v\ is strictly positive and v(c \ a) < Q{c | a}/y 
for some y > and all (a, c) E X x «^f*. 

Proof. Recall our assumption that X is weakly irreducible and critical. This implies the existence 
of a strictly positive probability vector uq on X such that v*(a,c) = Q{c|a}uo(a) € M{X x X*) is 
shift-invariant with v*{a) = uq(o) and </(^*) = 0. Fixing v E O with J(i^) < 00, we have for each 
< e < 1 that v e = (1 — e)f + ez^* is shift-invariant in A4(X x A"*) with (1^)1 strictly positive and 
f e (c I a) = exactly for those values (a, c) £ X x * where Q{c | a} = 0. By convexity of J(-) we know 
that J{v £ ) < (1 — e)J(v). Further, J fdv e — > J fdu as e | 0, for any / : X x X* — > R which is either 
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bounded or satisfies f(b, c) = m(a, c)lf, (b) for some a, b G X. As O is open in M(X x A"*), it follows 
that v e G O for all £ > small enough. 

In view of the above, we may and shall assume hereafter that v\ is strictly positive and v(c \ a) = 
exactly for those values (a, c) £ ^ x ^* where Q{c | a} = 0. In particular, the matrix Aq^ given by 

Ao,o(a, b) = ^2 m(a, c)v{c \ b), for a,b G X, 

cex* 

has nonnegative entries and is weakly irreducible. Its Perron- Frobenius eigenvalue, denoted g(Ao t o), 
equals 1, and the corresponding right eigenvector «o,o equals v\ and hence is a strictly positive prob- 
ability vector on X. The corresponding left eigenvector vo,o is a probability vector which is strictly 
positive on X r . Clearly, for each b G X r there exists c\ = c±(b) such that Q{ci | b} > 0, hence also 
v{c\ | b) > 0. Recall that for 6 G we have Q{c | b} > (and hence z^(c | 6) > 0) for only finitely many 
c £ X*. Consequently, u(c \ b) < Q{c \b}/y for some y > and all c E X* , b E X t . The proof of the 
lemma is complete if the same applies for all b G X r . Assuming hereafter that this is not the case, 
with J2aex t m ( a J c ) uniformly bounded under Q, there must exist bo G X r and C2 = 02(60) £ X* such 
that Q{c2 I 60} > (and hence also v{ci | 60) > 0), with J2aex r m (°> c 2) large enough to guarantee 
that X^aeA' t 'o,o(fl)(^( a ) c 2) — m(a, ci(&o))) > 0. Let ci(6) be arbitrary for 6 G A^, and C2 = c±(b) for 
all 6 + bo. T 

Using these c\ and C2 we next construct probability measures v x ,yi, ~\b) on X* for < y < yo and 
|x| < 1/2, such that for each b G A? and c G Af* we have 

• ^ lW (c|6) <Q{c|6}/y, 

• u x,y{c I b) — > fo,o(c I 6) = i/(c I 6) as x — > and y j 0, 

• v x,y{c I 6) = if and only if u(c \ b) = 0. 

Further, 

limsup#(^(- 1 6) || <Q>{. I b}) < H{u Q fl{ -\b)\\ Q{- | 6}) , (3.15) 

i/io 

and A x>y (a, b) = Ylc m ( a i c ) u x,y( c \b) ~^ Aofi(a,b) for any a, b G X. Note that Aj, i2/ (a, b) = if 
and only if Ao^ifl^b) = 0, so with ^4o,o weakly irreducible, the same applies to A X)V . The function 
f(x,y) = g{A X) y) is thus continuous in this range of (x,y), as is also the strictly positive Perron- 
Frobenius right eigenvector u XjV of A XjV , normalized to be a probability vector on X. Our construction 
is such that A x> q = ^0,0 + xB where B(a, b) = v (02 | b)v(c\ \ b)(m(a, C2) — m(a, c±)). Therefore, f(x, 0) 
is continuously differentiable at x = with 

21 Q) = E a , b vo,o(a)B(a,b)uo,o(b) > Q 
dx ' Ea u o,o(a)^o,o(a) 

By the implicit function theorem, there exist x(y) — > as y [ such that f(x(y),y) = /(0, 0) = 1 for 
all y > small enough. It follows that v x , y (b, c) = v x , y {c \ b)u X;y (b) defines a shift-invariant probability 
measure u XjV G M(X x A"*) for x = x(y) and all y > small enough. Moreover, 



m(a,c)u x{y)ty (b,dc) = A x ^ y (a,b)u x ^ y (b) -> A , (a, &)«o,o(&) = / m(a, c)i/(6, dc) , 
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for each a,b E X and y I 0, implying the convergence of v x (y),y to v in the topology of A4(X x 
and by (|3.15|) and shift-invariance, also 

limsup J(v x f y ),y) = limsup^u x( y );y (b)H(u x( y^ y (-\b) \\Q{-\b}) 
vio vio ,,. x 

< J2 u 0fi (b)H{u 0fi ( ■ | 6) || Q{ ■ | b}) = J{u) , 
bex 

which completes the proof of the lemma subject to the construction of v x ,y{ • \ b). 

We now turn to this construction. For any \x\ < 1/2 we define the probability measure 

v x ,o{c | b) = v{c | b) + xv(c 2 | b)v(ci | 6)(l {c=C2} - 1{ C=C1 }) . 

In particular, v x> o(c\b) = exactly where v{c\b) = and A X; o = Ao t o + xB as stated. Let yo = 
Q{c2 | bo} mmb £ x r Q{ci \ b} > further reducing yo as needed to ensure that v[c \ b) < Q{c \ b}/yo for 
any c E X* and b E X t . For any < y < yo define the probability measures v x ,y( -\b) by 

v x , y {c\b) = mm.(v x p{c\ b),Q{c\ b}/y) for c ^ c\ , 
v x , y {ci | b) = v X; o(ci \b) + ^2 (u(c I b) - Q{c | b}/y) + , 

with + indicating the positive part. Our choice of yo results in v x ,y( ' I b) = v( ■ \ b) whenever b E X t 
and further guarantees that 

Vx, y (c 2 | b ) = v X fl(c 2 | bo) < Q{c 2 | bo]/y 

and v x ,y{c\ \ b) < 1 < Q{ci | b}/y for all b E X r , \x\ < 1/2 and < y < yo- Hence we have as stated 
that v xy {c\b) < Q{c\b}/y for all c E X*, and ^.^(c | 5) = if and only if v(c\b) = 0. Moreover, 
A x>y = A Xi0 + Ey, for 

E y (a,b) = ^2 (rn(a,ci) - m(a,c))(u(c\b) - Q{c | b}/y) + , 

ctX* 

in particular, E y (a,b) = for b E A^. Writing n(c) = n if c E Af n . Recall that ^ c n(c)u(c \ b) = 
~^2 a Aofl(a, b) < oo for all b £ X, so by dominated convergence 

< X) ( n ( c l)+ n ( c )) z/ ( c l & ) 1 Mc|b)>Q{c|6}/ 2 /} —+ 0, 
cex* yl 

and consequently, as stated, each entry of A x<y is continuous in (x,y) E (—1/2,1/2) x [0,yo). By 
the same argument, Yl c ^ci( u ( c \ ^) ~~ Q{ c I b}/y)+ — > as y J. 0, implying the pointwise convergence 
^1,1/ (c | b) — ► ^(c| 6) for each (b,c) £ X x X*. Turning to 1)3.15)) . note that it suffices to consider only 
b E X r . Recall that for any q > the function zlog(z/q) increases in z E [q, 1], and if v x ^ y (c \ b) ^ 
v o,o (c | 6) and ci, c ^ c 2 , then necessarily < Q{c | 6} < f x ,y(c \ b) < fo,o(c I &) < 1- Consequently, 

yielding ()3.15)l since v x , y {ci \ b) — > fo,o(cj | 6) and Q{q | 6} > for i = 1, 2 and b £ X r . □ 

Using Lemma 13.61 we now establish the lower bound in Theorem 12.21 
Lemma 3.7. For each open set O C M(X x X*), 

liminf-logPlMx E O \ \T\ = n\ > - inf J(v). 

n^oo n ueo 
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Proof. Suppose that v is an approximate minimizer on the right hand side. We can assume without 
loss of generality that J(^) < oo, hence v is shift-invariant with v <C v\ (8 Q. By Lemma 13, HI we may 



and shall assume in addition that v\ is strictly positive and the function g associated to v via 1)3. 12|) 
is bounded from above. Recall from Lemma 13 . 51 that g§ = 1, and the corresponding Perron- Frobenius 
eigenvector u~ g satisfies 

v{a,c) = Q{c [ a} u g (a) , for every (a, c) G X x X * , 
and further that H{y \\ v\ (8 Q) = J <?(&, c) z^((f6, dc). It thus suffices to show that 

liminf -\ogF{M x G O I \T\ = n\ > - [ g(b,c) u(db,dc). 

n—*co fl J 

Since g is bounded above, fixing e > we can choose an open set O C O such that v G O and 
(g,(J>) < (ff; ^} + £ for all ^/ G O. We use the transformed probability measures P and the formula (|3.2[) 
for their density, to get 

~ ( dP 1 
F{M X G O, |r| =n}> E|^(T)l {A/xe6} l {m=n} j 

= e{ exp ( - ~g(X(v), C(v))) l {Mxe5} l { |T|=n}} 

> exp ( - n(g, v) - ne) x p{Mx G O, |T| = nj. 
Dividing by P{|T| = n} and recalling Lemma 13. II gives 

liminf- logP{M x G O I |T| = n) 



> -n(<?, i/) - ne + liminf - logpjMx GO |T| = n\. 



The result follows once we show that 

limsup-logP{Mx iO \T\=n\ < 0. (3.16) 

We use the upper bound (but now with the law P replaced by P) to establish (|3.16|) . Indeed, since g 
is bounded from above, we have Qje 77 | a} < oo for all a G X and rj > 0. So, denoting 

w \ I iiYz/ II v\ <8>0) if f is shift-invariant, 
J(i/) = <^ v 11 ^ ■ 

[ oo otherwise, 

the upper bound gives 

limsup-logP|M x 4. 6 \T\ = n\ < - inf J{p), 

where K C O c is a compact subset of A4(X x Af*). It suffices to show that the infimum is positive. 
Suppose, for contradiction, that there exists a sequence u n with J(u n ) [ 0. By compactness of K 
and lower semicontinuity of v i— > J(^), we can extract a limit point u £ K with J(5) = 0, and 
hence is shift-invariant and ff(i> || v\ (8 Q) = 0. This implies that £(a, c) = 0{ c I a }&i(a), for every 
(a, c) £ X x X*. Then, using shift-invariance of D, for any b £ X, 

0{c | a}m(6, c)i>i (a) = u(a, c)m(b, c) = V\{b). 

(a,c)eXxX* {a,c)£XxX* 

By the uniqueness of the Perron-Frobenius eigenvector we infer that v\ = u g = v\ and this implies 
v = v, which contradicts v G K. □ 
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We complete the proof of Theorem 12,21 bv noting that the rate function J has compact level sets, i.e. 
is a good rate function. This follows from abstract considerations as stated, e.g., in |DZ98) Theorem 
1.2.18]. 

3.4 Proof of Theorem IUTT1 

Note that X is an irreducible, critical multitype Galton- Watson tree with offspring law 

n 

Q{c| 6} = p(n) Y\_Qi a i I b }> for c = • • • , a n), 

i=l 

such that all exponential moments are finite. We derive Theorem 12.11 from Theorem 12 . 21 bv applying 
the contraction principle to the continuous linear mapping F : M(X x X*) -> R XxX , defined by 

F(v)(a, b) = ^2 m(b, c)u(a, c) for all v G M(X x X*) and a,b G X. 

Indeed, Theorem 12.21 implies the large deviation principle for F{Mx) conditioned on {|T| = n} with 
the good rate function = inf{J(i/) : F(v) = fj,}, see for example [DZ981 Theorem 4.2.1]. Convexity 
of I follows easily from the linearity of F and convexity of J. It is easy to see that on {\T\ = n} 
we have Lx = ^rjF(Mx)- It follows that conditioned on {|T| = n} the random variables Lx are 
exponentially equivalent to F(Mx), hence Lx satisfy the same large deviation principle as F(Mx), 
see |DZ981 Theorem 4.2.13]. Without loss of generality we restrict the space for the large deviation 
principle of Lx to the set of all probability vectors on X x X, see |DZ981 Lemma 4.1.5(b)]. 

Turning to the proof of ([2.2)1 . recall that v is shift-invariant if and only if ^ a F(u)(a, b) = v\(b) for 
all b £ X. Hence, if also F(v) = /i, then necessarily z^i = fi2 and consequently, 

1(h) = inf {H(u ® <Q>) : F(v) =f Jl ,u 1 = fj, 2 } . 

Note that v\(a) = yields ^2i,F(i')(a,b) = 0. Hence if /xi(o) > = ^2(0) for some a G X then 
{v : F(u) = fi, v\ = [12} is an empty set, and therefore /(//) = 00. Assuming hereafter that ii\ <C H2, 
it is not hard to check that 

10*) = I (^4, Q{ • I «}) , (3-17) 

where for : ^ — > M + and q G .M (<%"*), 

g) = inf || g) : v G 7W(^*), 0(6) = ^ m(6, c) ?(c) for all b G Af| . (3.18) 

Suppose now that q(c) = p(n) ]J% =1 ?( a i) f° r an c = ( n ) a i> • • • 3 a n)> where q(-) is a probability vector 
on X and p( • ) a probability measure with mean one on the nonnegative integers, whose exponential 
moments are all finite. With z = Ylb we snow next that, 

I(0,q)=zH{0/z\\q)+L p (z) . (3.19) 

Once this is done, we combine (|3.19|) for q(-) = Q{ ■ \ a} and z = ni(a) / ^(a) with the representation 
([3.17)1 of I(fi), which directly yields the formula ([2.2)1 . thus completing the proof of the theorem. 

To prove (|3.19f) . suppose first that z = 0, i.e. (p(b) = for all b G X. In this case, z?((O,0)) = 1 is 
the only possible measure in (|3.18|) . leading to I(4>,q) = — logg((0, 0)) = — logp(0), whereas it follows 
from ([2.1)1 that I p (0) = — logp(0) establishing ()3.19j) for such 4>(-). Assume hereafter that z > 0. Now 
the possible measures ?(•) in ([3.18)1 are of the form V(c) = s(n)v n (ai, . . . , a n ) for c = (n, ai, . . . , a n ), 
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with vq = 1, where s(-) is a probability measure on the nonnegative integers whose mean is z, and 
Vn('), n > 1, are probability measures on X n with marginals v n ,i( ■ ) such that 



oo 

n=l i=l 



; s(n) ^ u nt< (6) for all b G AT . (3.20) 

i=i 

By the assumed structure of g( • ) we have for such u( ■ ) that 

oo 

H(V \\q) = Y, B{n)H{v n \\ T) + H{s \\ p) , 

71=1 

where §™ denotes the product measure on X n with equal marginals q. Recall that 

oo oo n oo n 

s(n)H(v n || q"") > ^2s(n)^2H(v nti \\q) > zH (V 1 ^ s(n) ^ , 

rt=l n=l i=l n=l i=l 

with equality whenever v n = ni=i v n,i an d v n,« are independent of n and ? (see DZ98, Lemma 7.3.25] 
for the first inequality, with the second inequality following by convexity of i?(- || q) and the fact that 
^2 n s(n)n = z). So, in view of (|3.20l) . 

H{v || q) > zH((p/z || q) + H(s \\ p) , (3.21) 

with equality when v n = {z~ l 4>) n f° r all n > 1. Recall that with all exponential moments of p{ ) finite, 
I p (z) = ini{H(s \\ p) : s(-) a probability measure on {0, 1, . . .} and ^ n s{n)n = z] (see |DZ98l (2.1.27)] 
for a similar identity). Combining this with (|3.21|) leads to (|3.19j) and completes our proof. 

3.5 Proof of Theorem IHTfl 

In the first step we extend the result of Theorem 12 .21 to ^-generation empirical offspring measures, for 
each k > 2, in case Q is irreducible and the offspring size is bounded by some non-random iVo < oo. 

For each k > 0, let X (k) be the finite set of typed trees with height at most k and maximal degree A^+l, 
equipped with the discrete topology (in particular, X(0) = X). Let irk '■ X — > X(k) be the canonical 
projection obtained by removing all vertices in generations exceeding k and ix^i : X(k) — > X(l), k >l, 
the projections obtained by removing all vertices in generations exceeding I. 

If X is a finite typed tree and v is a vertex in this tree, we denote by X v the subtree rooted in v and 
let the /c-generation empirical offspring measures M\ associated to X be defined as 

M k x {b) = ^ E <w«)( 6 )« for a11 b G x ( k ) 

(for example M\{b) = Mx(a, c) where b G X(l) has root of type a with n children of types a±, . . . , a n 
and c = (n,ai, . . . , a n )). Given a G X{k — 1) and 6 G Af(fc) we write m^a, b) for the number of children 
v of the root in b such that b v = a. A measure [i on X(k) is called shift-invariant if 

M°^_i(a)= 2 m fc( a > b)[i(b), for all a G ^(fc — 1). (3.22) 
feeA'(fc) 

We equip the space M(X(k)) of probability measures on X{k) with the smallest topology which makes 
the functionals fj, i— > J fdfi continuous for each bounded / : X{k) — > R (since the maximal degree is 
bounded in X(k), it follows that \x i— > J rrik(a,x)dii{x) is also continuous for each a G X{k — 1)). 

Define [i o 7rr, _ 1 ®i Q as the measure on X(k) obtained by providing children for each vertex of the 
k — 1 generation, independently according to the transition mechanism Q, and define the function 

H^n || fi o 7T j ^ fc _ 1 (g>i Q) if /i is shift-invariant, 
cxd otherwise, 
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on Ad(X(k)). Note that Ji(-) coincides with the good rate function J(-) of Theorem 12.21 

Lemma 3.8. Suppose that X is an irreducible, critical multitype Galton-Watson tree with uniformly 
bounded offspring sizes, conditioned to have exactly n vertices. Then, for n — > oo, the k-generation 
empirical offspring measure M\ satisfies a large deviation principle in A4(X(k)) with speed n and 
convex, good rate function Jfc(-). 

Proof. For Z > let X{1} C X(l) be the support of iti(X) for a multitype Galton-Watson tree X 
corresponding to the transition mechanism Q starting at any strictly positive measure for X(p). Let 
X m {l} be the partition of X{1} according to the height m = 0, 1, . . . , I of the tree. Let 

I : X{k) - X{k - 1} x X{k - 1}« given by { £g> \ J^f^fJ^ - 1}', 

where v\, . . . ,v n are the vertices in the first generation of b E Af{/c} ordered from left to right. 

To prove Lemma 13.81 we intend to apply Theorem 12.21 to a multitype Galton-Watson tree X on the 
enlarged finite type space X{k — 1}. We mark the objects related to this new tree by ~ 

The process X is constructed by choosing X(p) using the law of TTk-i(X), and the offspring number 
and types of a vertex v as C{v) = 12(b) for the typed tree b E X{k} obtained by providing children 
for each vertex in generation k — 1 of X(v) independently according to the transition mechanism Q. 

With Q irreducible, it is easy to check that any a E X {k — 1} can be reached by finitely many steps of 
the transition mechanism Q for X starting at any b E Xk—i{k — 1}. Further, if b E Xi{k — 1} for some 
I < k — 1, then Q{- 1 b} is supported by Un=oi n } x ~ implying that A(a,b) = whenever 
a E X m {k — 1} for some m > I. Consequently, Q is weakly irreducible on X{k — 1}. Let po denote 
the Perron-Frobenius eigenvector of the irreducible matrix A, normalized to be a strictly positive 
probability vector on X. Then, pi = (g>i Q for I > 1 are strictly positive probability vectors on 
X{1}, such that pi oirf^ = pi-i for all / > 1. Moreover, with po the right eigenvector corresponding 
to the eigenvalue 1 of the matrix A, it follows by induction on I > 1 that pi are shift-invariant on 
X(l). In particular, for any a E X{k — 1}, 

^2 Ma,b)p k _ 1 (b) := ^ m(a, c)Q(c | 6)/ife_i (6) = ^ m fc (a,6)/i fc _i ®i Q(6) = Pk-i(a) . 
bex{k-i} b&x{k-i} l&x{k\ 

With pk-i a strictly positive right eigenvector for the eigenvalue 1 and the matrix A, we see that 
Q is also critical. Consequently, we have from Theorem 12.21 that satisfy the large deviation 
principle in M(X {k — 1} x X {k — 1}*) with the good rate function J(-) corresponding to Q. For each 
v\ E M.(X{k — 1}) the measure v\ o Q is supported on the closed (finite) set l(X{k}). Consequently, 
is supported on l(X{k}) as is any v for which J(v) < 00, allowing us to restrict this large deviation 
principle to M.(Z(X{k})). Identifying M.(T(X{k})) with M.(X{k}) via the mapping p = 1/0I, the law 
of is exactly mapped to that of M^-. Moreover, v E M.(I(X{k})) is shift-invariant if and only if p 
is shift-invariant on X(k) as defined in (|3.22j) . with v\ = M 07r jut-i an( i (^i<X)Q)oI = (po^l. _ x ) <8>iQ. 



This leads to the large deviation principle for M^- with the good rate function </&(■), restricted to 
M(X{k}). 

To complete the proof it suffices to check that any shift-invariant measure p E M(X(k)) with p 
p o 7rr,_ 1 0i Q in Ai(X(k)) is supported by X{k}. To this end, fix a shift-invariant p in M(X (k)) 
and note that J N[m]dp = 1 for m = 1, . . . , k. Hence we can associate shifted probability measures 
S m (p) E M(X(k-m)) with p such that S°(p) = p, S m (p) = 5(5 m " 1 (/i)) for m = 1, . . . , k, and 
S(p) is defined as in ()2.3|) . The shift-invariance of p implies that S m (p) o 7r^ m x is independent of 
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m = 0, . . . , k — 1. Recall that the measure 5 ,fc_1 (/x) of each (a, c) € (1) is the expectation under \x of 
the number of vertices of generation k — 1 of the tree whose type is a £ X and which have offspring 
c£ A". Our assumption that /i <C M 07r fc jLi ®i Q thus implies that the support of S k ~ 1 (fi) is a subset 
of the support of /ii, which is ^{1}. Consequently, 5"™(/i) o Tr k ~^_ ml are supported by ^{1} for all 
m = 0, . . . , & — 1, which implies that /x is supported by as claimed. □ 

To move from the empirical fc-generation offspring measures Mv to the empirical subtree measure Tx 
we use the Dawson-Gartner theorem, see e.g. DZ98, Theorem 4.6.1]. Note that the spaces X(k) and 
the canonical projections u k i, k > I, form a projective system of Polish spaces and that the projective 
limit coincides with the Polish space X. 

Similarly, the probability measures on X(k) with the projections 7r£ l defined by ix* k = [i>°Tt k \ form a 
projective system and the projective limit is the Polish space A4(X) described before Theorem 12 . 31 and 
the canonical projections 7rj£ : A4(X) —> A4(X(k)) can be defined by vr^(/i) = fio ir^ 1 . Details follow 
from an argument similar to the one given in |DZ981 Lemma 6.5.14]. Recalling that M x = Tx o vr^ 1 , 
the Dawson-Gartner theorem yields the following corollary of Lemma 13.81 (see for example DZ98 , 
Corollary 6.5.15] for a similar derivation). 

Corollary 3.9. Suppose that X is an irreducible, critical multitype Galton-Watson tree with uniformly 
bounded offspring sizes, conditioned to have exactly n vertices. Then, forn — > oo, the empirical subtree 
measure Tx satisfies a large deviation principle in A4(X) with speed n and convex, good rate function 

K(n) = sup J k {n o TT~ l ). 
k>x 

To complete the proof of Theorem 12. 31 it just remains to show that K{-) = K(-). For this purpose first 
assume that € A4(X) is shift-invariant. Then, for each k > 1 and a G X{k — 1), 

r N 
J i=i 

= d/j 1 (X)m k (a,ir k X)= ^ fi o ir^Q)) m k (a, b). 
J bex(k) 

In other words, for each k > 1, the measure fi o ir^ 1 is shift-invariant in Ai(X(k)). Conversely, if 
[i o ir^ 1 is shift-invariant in Ai(X(k)) for every k > 1, the same calculation shows that \i = S(n) 
on the collection of sets of the form ttZ 1 (A) for any k > 1 and A C X(k). As this collection of sets 
is closed under finite intersections and it generates the Borel a-field on X, we infer that \i itself is 
shift-invariant. 

Recall the definition of the projections po,pi f° r backward trees. For the proof of Theorem 12.31 it only 
remains to verify the following lemma. 

Lemma 3.10. For every shift-invariant probability measure [i on X we have 

H(n* o p- 1 || fj,* o p 1 ® Q) = sup H(n o n- 1 || /x o ttI\ ®x Q) . (3.23) 

k>2 

Proof. Define projections 7r£ : X — > X(k) as follows: Order the vertices v±, V2, ■ ■ ■ in generation k — 1 of 
x £ X from left to right, with v\ the leftmost. The tree ^ k (x) is obtained by removing all vertices in 
generations exceeding k and all vertices in generation k whose parent is some m, I > j. In particular, 
7r^(x) = 7r k ^i(x) and it k {x) = n k (x) for all j > N[k — l](x). Let \x o (vr^)^ 1 ®j Q denote the measure 
obtained by sampling X according to \i and then independently adding offspring according to Q to 
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each of the vertices vi for I > j in generation k — 1 of iri(X). Observe that we define this measure 
for all j and that in many cases no vertices in generation k are removed or added. Assume first that 
/x o </io 7rr_ 1 (g>i Q. Then, in case /u o 7rj~ (sc) > and iV[A; — !](»)= n > 1 we find that 



//o7r A 1 (a;) _ jj M°(*fc +1 ) 

3=1 



H o n-\ 0i Q(x) /i o ®j Q(x) ' 



with all the terms on the right hand side positive. Recall the definition of the measure fik—i = I 1 * °Pk-l 
and the projections po,fc-li Pi,fc-l on %[k — 1] and also recall that (y,v) S X[k — 1] denotes the tree 
y € X with centre u in generation k — 1 of y. Hence, for 1 < j < n, 



pK +1 r ! ®j+iQ(x) _ M*-l°Pi,fc-i(Tfc(a:).v. 



(3.25) 



with all terms positive. Note that if iV[A; — l](x) = then fi o ir k (x) = fi o 7r fe _ 1 ®i Q(x), whereas if 

y = 7r[(x) with N[k — l](s) = n > then iV[A: — l](y) = n and /io (7r[) _1 (y) = /i o p^k-i^V > w i) ^ or an y 
1 < j < n. Hence, l|3~2IJl and (|3~23|) imply that 

Tit -ill -1 <n\ ST "t^* 5 ^ / ft-iop^LKW,^) \ 

= o p-i_ 1 || o ^ Q) . (3.26) 

Finally, note that 

fj, o 7r A T 1 (x) > and /i o ttZ^ ®i Q(x) = for some x G X(k), 
if and only if there exists 1 < j < JV[A; — l](x) such that 

Mfc-i °Pi^fc-i( 7r fc( x )' u i) > and Ma-i °P<yL-i ® Q( 71 i( a 0'' u j) = °- 

Consequently, /i o 71"^ 1 < po 7r^"_ 1 (g>i Q if and only if /Ufc_i o p^_ 1 <C Mifc-1 ° Pok-\ ® Q> with (|3.26|) 
holding for any shift-invariant fi £ A4(X) and k > 2. By the identities ^ o po = po k Pk an d 
Pk Pi = Pi,fc °P)c this amounts to 

H(ijl o vr^ 1 || o TT k \ ®i Q) = (// o p- 1 op-^ || (// o p Q 1 <g)Q) o p"^) . (3.27) 

The variational characterization of the relative entropy states that, for two probability measures v\, V2 
on the Polish space X , 

H(yi op" 1 || i>2 °Pk~ 1 ) = sup < / (j) o p k dv\ — log / e^ opk du 2 >, 

(j>ec b (x[k]) I Jx_ Jx_ ) 

where C&(A?[fc]) is the set of continuous, bounded functions on X[k] (see for example DZ98, Lemma 
6.2.13]). Obviously, this expression is increasing in k and by the same representation it is bounded by 
H{u\ || 1/2)1 which together with ()3.27|) shows that the left hand side of (|3.23|) is at least as large as its 
right hand side. 

Conversely, for any continuous bounded function <p : X_ —* M and e > there exists a uniformly 
continuous function tp : X_ — > R such that 



log / di>2 — log / dv2 < e and / <p dv\ — 
Jx Jx Jx Jx 



ip dv\ 

ix 



< e. 
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Moreover, with X_ being the projective limit of X[k], we can find a k > 1 and a continuous, bounded 
function ip k : X[k] — > M such that \ip k o Pk{x) — ?p(x)\ < e for all Hence 



sup sup < / 4> o p k dv\ — log / e <p ° Pfe > > sup < / <f> dv\ — log / diAj 
fc>2^ G c 6 (A'[fc]) I .A* ja: J ^etf&cao wa: Jx 

which together with ()3.27|) shows that the right hand side of 1)3.23 J) is at least as large as its left hand 
side. This completes the proof of the lemma. □ 
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